Transitioning Existing Content: inferring organisation-specific documents

نویسندگان

  • Arijit Sengupta
  • Sandeep Purao
چکیده

A definition for a document type within an organization represents an organizational norm about the way the organizational actors represent products and supporting evidence of organizational processes. Generating a good organization-specific document structure is, therefore, important since it can capture a shared understanding among the organizational actors about how certain business processes should be performed. Current tools that generate document type definitions focus on the underlying technology, emphasizing tags created in a single instance document. The tools, thus, fall short of capturing the shared understanding between organizational actors about how a given document type should be represented. We propose a method for inferring organization-specific document structures using multiple instance documents as inputs. The method consists of heuristics that combine individual document definitions, which may have been compiled using standard algorithms. We propose a number of heuristics utilizing artificial intelligence and natural language processing techniques. As the research progresses, the heuristics will be tested on a suite of test cases representing multiple instance documents for different document types. The complete methodology will be implemented as a research prototype.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of strategic planning documents in selected public universities of the country

Abstract: The purpose of this study is to compare the official documents of strategic planning in selected public universities of the country in order to answer these two main questions: what are the differences and similarities between the formal and content elements of strategic planning documents of the selected public universities? Is there a unique design in these documents that fits the s...

متن کامل

Consistency Management of Distributed Documents using XML and Related Technologies

In this paper we describe an approach and associated techniques for managing consistency of distributed documents. We give an account of a toolkit which demonstrates the approach. The approach supports the management of consistency of documents with Internet-scale distribution. It takes advantage of XML (eXtensible Markup Language) and related technologies. The paper contains a brief account of...

متن کامل

Browser Extension TO Removing Dust Using Sequence Alignment and Content Matching

---------------------------------------------------------------------***--------------------------------------------------------------------Abstract If documents of two URLs are similar, then they are called DUST. Similarly, detection of near duplicate documents is complex. The duplicate documents content will be similar but there will be small differences in the content. Different URLs with sa...

متن کامل

Extractive Multi-Document Summaries Should Explicitly Not Contain Document Specific Content

Unsupervised approaches to multi-document summarization consist of two steps: finding a content model of the documents to be summarized, and then generating a summary that best represents the most salient information of the documents. In this paper, we present a sentence selection objective for extractive summarization in which sentences are penalized for containing content that is specific to ...

متن کامل

Inferring Sources of Leaks in Document Management Systems

A document management system (DMS) provides for secure operations on a distributed repository of digital documents. This paper presents a two-phase approach to address the problem of locating the sources of information leaks in a DMS. The initial monitoring phase treats user interactions in a DMS as a series of transactions, each involving content manipulation by a user; in addition to standard...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Australasian J. of Inf. Systems

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2000